智能手机已经使用基于生物识别的验证系统,以在高度敏感的应用中提供安全性。视听生物识别技术因其可用性而受欢迎,并且由于其多式化性质,欺骗性将具有挑战性。在这项工作中,我们介绍了一个在五个不同最近智能手机中捕获的视听智能手机数据集。考虑到不同的现实情景,这个新数据集包含在三个不同的会话中捕获的103个科目。在该数据集中获取三种不同的语言,以包括扬声器识别系统的语言依赖性问题。这些数据集的这些独特的特征将为实施新的艺术技术的单向或视听扬声器识别系统提供途径。我们还报告了DataSet上的基准标记的生物识别系统的性能。生物识别算法的鲁棒性朝向具有广泛实验的重播和合成信号等信号噪声,设备,语言和呈现攻击等多种依赖性。获得的结果提出了许多关于智能手机中最先进的生物识别方法的泛化特性的担忧。
translated by 谷歌翻译
In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented racks, the front and the top view layout of each shelf within a rack. With minimal effort, such an output is transformed into a 3D rendering of all racks, shelves and objects on the shelves, giving an accurate 3D depiction of the entire warehouse scene in terms of racks, shelves and the number of objects on each shelf. MVRackLay generalizes to a diverse set of warehouse scenes with varying number of objects on each shelf, number of shelves and in the presence of other such racks in the background. Further, MVRackLay shows superior performance vis-a-vis its single view counterpart, RackLay, in layout accuracy, quantized in terms of the mean IoU and mAP metrics. We also showcase a multi-view stitching of the 3D layouts resulting in a representation of the warehouse scene with respect to a global reference frame akin to a rendering of the scene from a SLAM pipeline. To the best of our knowledge, this is the first such work to portray a 3D rendering of a warehouse scene in terms of its semantic components - Racks, Shelves and Objects - all from a single monocular camera.
translated by 谷歌翻译
由于独特的驾驶特征,人类驾驶员具有独特的驾驶技术,知识和情感。驾驶员嗜睡一直是一个严重的问题,危害道路安全。因此,必须设计有效的嗜睡检测算法以绕过道路事故。杂项研究工作已经解决了检测异常的人类驾驶员行为的问题,以通过计算机视觉技术检查驾驶员和汽车动力学的正面面孔。尽管如此,常规方法仍无法捕获复杂的驾驶员行为特征。但是,以深度学习体系结构的起源,还进行了大量研究,以分析和识别使用神经网络算法的驾驶员的嗜睡。本文介绍了一个基于视觉变形金刚和Yolov5架构的新颖框架,以实现驾驶员嗜睡的识别。提出了定制的Yolov5预训练的结构,以提取面部提取,目的是提取感兴趣的区域(ROI)。由于以前的体系结构的局限性,本文引入了视觉变压器进行二进制图像分类,该二进制图像分类在公共数据集UTA-RLDD上经过训练和验证。该模型分别达到了96.2 \%和97.4 \%的培训和验证精度。为了进行进一步的评估,在各种光明情况下的39名参与者的自定义数据集上测试了拟议的框架,并获得了95.5 \%的准确性。进行的实验揭示了我们在智能运输系统中实用应用框架的重要潜力。
translated by 谷歌翻译
将包含文本和不同边缘类型的文本的信息节点连接的异质网络通常用于在各种现实世界应用程序中存储和处理信息。图形神经网络(GNNS)及其双曲线变体提供了一种有希望的方法,可以通过邻域聚集和分层特征提取在低维的潜在空间中编码此类网络。但是,这些方法通常忽略Metapath结构和可用的语义信息。此外,这些方法对训练数据中存在的噪声很敏感。为了解决这些局限性,在本文中,我们提出了富含文本的稀疏双曲图卷积网络(TESH-GCN),以使用语义信号捕获图形的Metapath结构,并进一步改善大型异质图中的预测。在TESH-GCN中,我们提取语义节点信息,该信息连接信号是从稀疏的双曲线图卷积层中从稀疏邻接张量中提取相关节点的局部邻域和图形级Metapath特征。这些提取的功能与语言模型的语义特征(用于鲁棒性)结合使用,用于最终下游任务。各种异质图数据集的实验表明,我们的模型在链接预测任务上的大幅度优于当前最新方法。我们还报告说,与现有的双曲线方法相比,训练时间和模型参数均减少了,通过重新的双曲线图卷积。此外,我们通过在图形结构和文本中使用不同级别的模拟噪声来说明模型的鲁棒性,并通过分析提取的Metapaths来解释Tesh-GCN的预测机制。
translated by 谷歌翻译
提高搜索结果的质量可以显着增强用户的体验和与搜索引擎的交战。尽管机器学习和数据挖掘领域的最新进展,但正确对特定用户搜索查询的项目进行了分类一直是一个长期的挑战,这仍然有很大的改进空间。本文介绍了“购物查询数据集”,这是一个很大的亚马逊搜索查询和结果的大型数据集,以促进研究以提高搜索结果的质量,以促进研究。该数据集包含大约1.3万个独特的查询和260万手动标记(查询,产品)相关性判断。该数据集具有多语言,其中包括英语,日语和西班牙语的查询。购物查询数据集用于KDDCUP'22挑战之一。在本文中,我们描述了数据集并介绍了三个评估任务以及基线结果:(i)对结果列表进行排名,(ii)将产品结果分类为相关性类别,以及(iii)确定给定查询的替代产品。我们预计这些数据将成为产品搜索主题的未来研究的黄金标准。
translated by 谷歌翻译
该调查侧重于地球系统科学中的当前问题,其中可以应用机器学习算法。它概述了以前的工作,在地球科学部,印度政府的持续工作,以及ML算法的未来应用到一些重要的地球科学问题。我们提供了与本次调查的比较的比较,这是与机器学习相关的多维地区的思想地图,以及地球系统科学(ESS)中机器学习的Gartner的炒作周期。我们主要关注地球科学的关键组成部分,包括大气,海洋,地震学和生物圈,以及覆盖AI / ML应用程序统计侦查和预测问题。
translated by 谷歌翻译
深度学习推荐模型(DLRM)是广泛的,占据了相当多的数据中心足迹,并每年增长超过1.5倍。使用模型尺寸很快在Tberytes范围内,利用存储类(SCM)的推理,可以降低功耗和成本。本文评估将内存层级扩展到DLRM的主要挑战,并提出了通过软件定义内存提高性能的不同技术。我们展示了基础技术,如NAND Flash和3DXP的差异化,并涉及现实世界场景,从而可以节省5%至29%。
translated by 谷歌翻译
强化学习(RL)在机器人,游戏和医疗保健等应用领域取得了重大成功。但是,培训RL代理商非常耗时。由于CPU上的不规则内存访问和线程级同步开销等挑战,当前的实现表现出较差的性能。在这项工作中,我们提出了一种用于在多核系统上产生可扩展的强化学习实现的框架。重放缓冲区是RL算法的一个关键组件,其有助于存储从环境相互作用和用于学习过程的数据采样的样本。我们为基于$ k $ $-arty sum树定义了一个新的数据结构,用于支持异步并行插入,采样和优先级更新。为解决不规则内存访问的挑战,我们提出了一种新颖的数据布局来存储减少缓存未命中的SUCH树的节点。此外,我们提出$ \ Textit {懒惰的写入} $机制,以减少重放缓冲区操作的线程级同步开销。我们的框架采用平行演员通过环境交互和并行学习者同时收集数据,并使用收集的数据执行随机梯度下降。我们的框架支持各种强化学习算法,包括DQN,DDPG等。我们通过使用OpenAI基准对CPU + GPU平台进行实验来证明我们的框架在加速RL算法中的有效性。
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
Modelling and forecasting real-life human behaviour using online social media is an active endeavour of interest in politics, government, academia, and industry. Since its creation in 2006, Twitter has been proposed as a potential laboratory that could be used to gauge and predict social behaviour. During the last decade, the user base of Twitter has been growing and becoming more representative of the general population. Here we analyse this user base in the context of the 2021 Mexican Legislative Election. To do so, we use a dataset of 15 million election-related tweets in the six months preceding election day. We explore different election models that assign political preference to either the ruling parties or the opposition. We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods. These results demonstrate that analysis of public online data can outperform conventional polling methods, and that political analysis and general forecasting would likely benefit from incorporating such data in the immediate future. Moreover, the same Twitter dataset with geographical attributes is positively correlated with results from official census data on population and internet usage in Mexico. These findings suggest that we have reached a period in time when online activity, appropriately curated, can provide an accurate representation of offline behaviour.
translated by 谷歌翻译